Table Representation Learning Workshop

Tables are a promising modality for representation learning and generative models with too much application potential to ignore. However, tables have long been overlooked despite their dominant presence in the data landscape, e.g. data management and analysis pipelines. The majority of datasets in Google Dataset Search, for example, resembles typical tabular file formats like CSVs. Similarly, the top-3 most-used database management systems are all intended for relational data. Representation learning for tables, possibly combined with other modalities such as code and text, has shown impressive performance for tasks like semantic parsing, question answering, table understanding, data preparation, and data analysis (e.g. text-to-sql). The pre-training paradigm was shown to be effective for tabular ML (classification/regression) as well. More recently, we also observe promising potential in applying and enhancing LLMs in the domain of structured data to improve how we process and derive insights from structured data.

The Table Representation Learning (TRL) workshop is the premier venue in this emerging research area and has three main goals:

(1) Motivate tables as a primary modality for representation and generative models and advance the area further.
(2) Showcase impactful applications of pretrained table models and identify open challenges for future research, with a particular focus on industry insights in 2024.
(3) Foster discussion and collaboration across the ML, NLP, IR and DB communities.

When: Saturday 14 December 2024.
Where: Vancouver, Canada.

Submit: 20 September 2024, https://openreview.net/group?id=NeurIPS.cc/2024/Workshop/TRL

Review for TRL'24 (much appreciated!): https://forms.gle/WARtyyaJSpGUC2B56

General questions: table-representation-learning-workshop@googlegroups.com

Specific questions: madelon@berkeley.edu

Follow on Twitter: @TrlWorkshop

SAP

Call for Papers

Important Dates

Submission Open	September 1, 2024
Submission Deadline	September 20, 2024 (11:59PM AoE)
Notifications	October 9, 2024 (11:59PM AoE)
Camera-ready	October 30, 2024 (11:59PM AoE)
Slides for contributed talks	November 30, 2024 (11:59PM AoE)
Video pitches for posters (optional)	November 30, 2024 (11:59PM AoE)
Workshop Date	December 14, 2024

Scope

We invite submissions on representation and generative learning over tables, related to any of the following topics:

Representation Learning for (semi-)Structured Data such as spreadsheets, tables, and full relational databases. Example contributions are new model architectures, data encoding techniques, tailored tokenization methods, pre-training and fine-tuning techniques, etc.
Generative Models and LLMs for Structured Data such as Large Language Models (LLMs) and diffusion models, and specialized techniques for prompt engineering, single-task and multi-task fine-tuning, LLM-driven interfaces and multi-agent systems, retrieval-augmented generation, etc.
Multimodal Learning where structured data is jointly embedded or combined with other modalities such as text, images, and code (e.g., SQL), knowledge graphs, visualizations/images.
Applications of TRL models of table representations for tasks like data preparation (e.g. data cleaning, validation, integration, cataloging, feature engineering), retrieval (e.g. data search, fact-checking/QA, KG alignment), analysis (e.g. text-to-SQL and visualization), tabular data generation, (end-to-end) tabular machine learning, table extraction (e.g. parsers/extraction for unstructured data), and query optimization (e.g. cardinality estimation).
Challenges of TRL models in production Work addressing the challenges of maintaining and managing TRL models in fast-evolving contexts, e.g., data updating, error correction, and monitoring, handling data privacy, personalization performance, etc.
Domain-specific challenges for learned table models often arise in domains such as enterprise, finance, medical, law. These challenges pertain to table content, table structure, privacy, security limitations, and other factors that necessitate tailored solutions.
Benchmarks, analyses, and datasets for TRL including assessing LLMs and other generative models as base models versus alternative approaches, analysis of model robustness with respect to large, messy, and heterogeneous tabular data, etc.
Other contributions such as surveys, demonstrations, visions, and reflections on table representation learning and generative models for structured data.

Organization

Workshop Chairs

Madelon Hulsebos

UC Berkeley

Haoyu Dong

Microsoft

Laurel Orr

Numbers Station AI

Qian Liu

Sea AI Lab

Vadim Borisov

University of Tübingen

Program

TRL is again entirely in-person, and will this year feature 2 poster sessions and contributed talks. We also host a few exciting invited talks on established research in this emerging area, and a panel discussion focused on industry/startup perspectives.

Invited Speakers

Yasemin Altun

Google DeepMind

Binyuan Hui

Qwen Team, Alibaba

Mirella Lapata

University of Edinburgh

Gaël Varoquaux

Inria, Probabl

Matei Zaharia

UC Berkeley, Databricks

Panelists (tentative and TBC)

Ines Chami

Numbers Station

Binyuan Hui

Qwen Team, Alibaba

Douwe Kiela

Contextual AI

Maithra Raghu

Samaya AI

Submission Guidelines

Submission link

Submit your (anonymized) paper through OpenReview at: TBC
Please be aware that accepted papers are expected to be presented at the workshop in-person.

Formatting guidelines

The workshop accepts regular research papers and industrial papers of the following types:

Short paper: 4 pages + references and appendix.

Regular paper: 8 pages + references and appendix.

Submissions should be anonymized and follow the NeurIPS style files (zip), but can exclude the checklist. Non-anonymous preprints are no problem, and artifacts do not have to be anonymized. Just submitting the paper without author names/affiliations is sufficient. Supplementary material, if any, may be added in the appendix. The footer of accepted papers should state “Table Representation Learning Workshop at NeurIPS 2024”. We expect authors to adopt an inclusive and diverse writing style. The “Diversity and Inclusion in Writing” guide by the DE&I in DB Conferences effort is a good resource.

Review process

Papers will receive light reviews in a double-anonymous manner. All accepted submissions will be published on the website and made public on OpenReview but the workshop is non-archival (i.e. without proceedings).

Novelty and conflicts

The workshop cannot accept submissions that have been published at NeurIPS or other machine learning venues as-is, but we do invite relevant papers from the main conference (NeurIPS) to be submitted to the workshop as 4-page short papers. We also welcome submissions that have been published in, for example, data management or natural language processing venues. We rely on OpenReview for handling conflicts, so please ensure that the conflicts in every author's OpenReview profile are complete, in particular, with respect to the organization and program committees.

Camera-ready instructions

Camera-ready papers are expected to express the authors and affiliations on the first page, and state "Table Representation Learning Workshop at NeurIPS 2024'' in the footer. The camera-ready version may exceed the page limit for acknowledgements or small content changes, but revision is not required (for short papers: please be aware of novelty requirements of archival venues, e.g. SIGMOD, CVPR). The camera-ready version should be submitted through OpenReview (submission -> edit -> revision), and will be published on OpenReview and this website. Please make sure that all meta-data is correct as well, as it will be imported to the NeurIPS website.

Presentation instructions

All accepted papers will be presented as poster during one of the poster sessions (TBA). For poster formatting, please refer to the poster instructions on the NeurIPS site, you can print and bring the poster yourself or consider the FedEx offer for NeurIPS. Optional: authors of poster submissions are also invited to send a teaser video of approx. 3 minutes (.mp4) to madelon@berkeley.edu, which will be hosted on the website and YouTube channel of the workshop.
Papers selected for spotlight talks are also asked to prepare a talk of 9 minutes (+1 min Q&A), and upload their slides through the "slides" field in OpenReview. Timeslots for the spotlights will be published soon. The recordings of oral talks will be published as well.

Program Committee: TBC

Unfold for full committee

We are very grateful to all below members of the Program Committee!
Wenhu Chen, University of Waterloo
Mukul Singh, Microsoft
Sercan O Arik, Google
Micah Goldblum, New York University
Andreas Muller, Microsoft
Xi Fang, Yale University
Naihao Deng, University of Michigan
Sebastian Schelter, BIFOLD & TU Berlin
Weijie Xu, Amazon
Rajat Agarwal, Amazon
Sharad Chitlangia, Amazon
Lei Cao, University of Arizona
Paul Groth, University of Amsterdam
Alex Zhuang, University of Waterloo
Sepanta Zeighami, University of California, Berkeley
Jayoung Kim, Yonsei University
Jaehyun Nam, KAIST
Sascha Marton, University of Mannheim
Tianji Cong, University of Michigan
Myung Jun Kim, Inria
Aneta Koleva, University of Munich
Peter Baile Chen, MIT
Gerardo Vitagliano, MIT
Reynold Cheng, the University of Hong Kong
Till Döhmen, MotherDuck / University of Amsterdam
Ivan Rubachev, Higher School of Economics
Raul Castro Fernandez, University of Chicago
Peng Shi, University of Waterloo
Paolo Papotti, Eurecom
Carsten Binnig, TU Darmstadt / Google
Tianyang Liu, University of California, San Diego
Tianbao Xie, the University of Hong Kong
Jintai Chen, University of Illinois at Urbana-Champaign
Sebastian Bordt, Eberhard-Karls-Universität Tübingen
Panupong Pasupat, Google
Liangming Pan, University of Arizona
Xinyuan Lu, National University of Singapore
Ziyu Yao, George Mason University
Shuhan Zheng, Hitachi, Ltd.
Shuaichen Chang, Amazon
Julian Martin Eisenschlos, Google DeepMind
Noah Hollmann, Albert-Ludwigs-Universität Freiburg
Linyong Nan, Yale University
Tianshu Zhang, Ohio State University
Liane Vogel, Technische Universität Darmstadt
Roman Levin Amazon
Henry Gouk, University of Edinburgh
Yury Gorishniy, Moscow Institute of Physics and Technology
Edward Choi, KAIST
Gyubok Lee, KAIST
Mingyu Zheng, University of Chinese Academy of Sciences
Tassilo Klein, SAP
Ge Qu, the University of Hong Kong
Artem Babenko, Yandex
Shreya Shankar, University of California Berkeley
Xiang Deng, Google
Zhoujun Cheng, UC San Diego
Mengyu Zhou, Microsoft Research
Mira Moukheiber, MIT
Niklas Wretblad, Linköping University
Gust Verbruggen, Microsoft
Mukul Singh, Microsoft
Amine Mhedhbi, Polytechnique Montréal

Accepted Papers

2024

Your Paper?

2023 (unfold)

Oral

MultiTabQA: Generating Tabular Answers for Multi-Table Question Answering
Vaishali Pal, Andrew Yates, Evangelos Kanoulas, Maarten Rijke
GCondNet: A Novel Method for Improving Neural Networks on Small High-Dimensional Tabular Data
Andrei Margeloiu, Nikola Simidjievski, Pietro Lió, Mateja Jamnik
High-Performance Transformers for Table Structure Recognition Need Early Convolutions
ShengYun Peng, Seongmin Lee, Xiaojing Wang, Rajarajeswari Balasubramaniyan, Duen Horng Chau
Self-supervised Representation Learning from Random Data Projectors
Yi Sui, Tongzi Wu, Jesse Cresswell, Ga Wu, George Stein, Xiao Shi Huang, Xiaochen Zhang, Maksims Volkovs
HyperFast: Instant Classification for Tabular Data
David Bonet, Daniel Mas Montserrat, Xavier Giró-i-Nieto, Alexander Ioannidis
Training-Free Generalization on Heterogeneous Tabular Data via Meta-Representation
Han-Jia Ye, Qile Zhou, De-Chuan Zhan
Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs
Ananya Singha, José Cambronero, Sumit Gulwani, Vu Le, Chris Parnin
Data Ambiguity Strikes Back: How Documentation Improves GPT's Text-to-SQL
Zachary Huang, Pavan Kalyan Damalapati, Eugene Wu
IngesTables: Scalable and Efficient Training of LLM-Enabled Tabular Foundation Models
Scott Yak, Yihe Dong, Javier Gonzalvo, Sercan Arik
Pool-Search-Demonstrate: Improving Data-wrangling LLMs via better in-context examples
Joon Suk Huh, Changho Shin, Elina Choi
How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings
Shuaichen Chang, Eric Fosler-Lussier
TabPFGen – Tabular Data Generation with TabPFN
Jeremy (Junwei) Ma, Apoorv Dankar, George Stein, Guangwei Yu, Anthony Caterini

Poster

Generating Data Augmentation Queries Using Large Language Models
Christopher Buss, Jasmin Mousavi, Mikhail Tokarev, Arash Termehchy, David Maier, Stefan Lee
ReConTab: Regularized Contrastive Representation Learning for Tabular Data
Suiyao Chen, Jing Wu, NAIRA HOVAKIMYAN, Handong Yao
Unlocking the Transferability of Tokens in Deep Models for Tabular Data
Qile Zhou, Han-Jia Ye, Leye Wang, De-Chuan Zhan
Augmentation for Context in Financial Numerical Reasoning over Textual and Tabular Data with Large-Scale Language Model recording
Yechan Hwang, Jinsu Lim, Young-Jun Lee, Ho-Jin Choi
TabContrast: A Local-Global Level Method for Tabular Contrastive Learning
Hao Liu, Yixin Chen, Bradley A Fritz, Christopher King
Explaining Explainers: Necessity and Sufficiency in Tabular Data
Prithwijit Chowdhury, Mohit Prabhushankar, Ghassan AlRegib
Beyond Individual Input for Deep Anomaly Detection on Tabular Data
Hugo Thimonier, Fabrice Popineau, Arpad Rimmel, Bich-Liên DOAN
GradTree: Learning Axis-Aligned Decision Trees with Gradient Descent
Sascha Marton, Stefan Lüdtke, Christian Bartelt, Heiner Stuckenschmidt
Elephants Never Forget: Testing Language Models for Memorization of Tabular Data
Sebastian Bordt, Harsha Nori, Rich Caruana
InterpreTabNet: Enhancing Interpretability of Tabular Data Using Deep Generative Models and Large Language Models
Jacob Yoke Hong Si, Rahul Krishnan, Michael Cooper, Wendy Yusi Cheng
On Incorporating new Variables during Evaluation
Harsimran Bhasin, Soumyadeep Ghosh
Unnormalized Density Estimation with Root Sobolev Norm Regularization
Mark Kozdoba, Binyamin Perets, Shie Mannor
Tree-Regularized Tabular Embeddings
Xuan Li, Yun Wang, Bo Li
Binning as a Pretext Task: Improving Self-Supervised Learning in Tabular Domains
Kyungeun Lee, Ye Seul Sim, Hyeseung Cho, Suhee Yoon, Sanghyu Yoon, Woohyung Lim
A Deep Learning Blueprint for Relational Databases recording
Lukáš Zahradník, Jan Neumann, Gustav Šír
Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks
Benjamin Feuer, Niv Cohen, Chinmay Hegde
Modeling string entries for tabular data prediction: do we need big large language models?
Leo Grinsztajn, Myung Jun Kim, Edouard Oyallon, Gael Varoquaux
Hopular: Modern Hopfield Networks for Tabular Data
Bernhard Schäfl, Lukas Gruber, Angela Bitto, Sepp Hochreiter
NeuroDB: Efficient, Privacy-Preserving and Robust Query Answering with Neural Networks
Sepanta Zeighami, Cyrus Shahabi
A DB-First approach to query factual information in LLMs
Mohammed SAEED, Nicola De Cao, Paolo Papotti
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning
Valeriia Cherepanova, Roman Levin, Gowthami Somepalli, Jonas Geiping, C. Bayan Bruss, Andrew Wilson, Tom Goldstein, Micah Goldblum
Incorporating LLM Priors into Tabular Learners
Max Zhu, Siniša Stanivuk, Andrija Petrovic, Mladen Nikolic, Pietro Lió
CHORUS: Foundation Models for Unified Data Discovery and Exploration
Moe Kayali, Anton Lykov, Ilias Fountalis, Nikolaos Vasiloglou, Dan Olteanu, Dan Suciu
Introducing the Observatory Library for End-to-End Table Embedding Inference recording
Tianji Cong, Zhenjie Sun, Paul Groth, H. V. Jagadish, Madelon Hulsebos
Scaling Experiments in Self-Supervised Cross-Table Representation Learning
Maximilian Schambach, Dominique Paul, Johannes Otterbach
Benchmarking Tabular Representation Models in Transfer Learning Settings
Qixuan Jin, Talip Ucar
Exploring the Retrieval Mechanism for Tabular Deep Learning
Felix den Breejen, Sangmin Bae, Stephen Cha, Tae-Young Kim, Seoung Hyun Koh, Se-Young Yun
In Defense of Zero Imputation for Tabular Deep Learning
John Van Ness, Madeleine Udell
Multitask-Guided Self-Supervised Tabular Learning for Patient-Specific Survival Prediction
You Wu, Omid Bazgir, Yongju Lee, Tommaso Biancalani, James Lu, Ehsan Hajiramezanali
Testing the Limits of Unified Sequence to Sequence LLM Pretraining on Diverse Table Data Tasks
Soumajyoti Sarkar, Leonard Lausen

2022 (unfold)

Oral

Analysis of the Attention in Tabular Language Models recording
Aneta Koleva, Martin Ringsquandl, Volker Tresp
Transfer Learning with Deep Tabular Models recording
Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah Goldblum
STable: Table Generation Framework for Encoder-Decoder Models recording
Michał Pietruszka, Michał Turski, Łukasz Borchmann, Tomasz Dwojak, Gabriela Pałka, Karolina Szyndler, Dawid Jurkiewicz, Łukasz Garncarek
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second recording
Noah Hollmann, Samuel Müller, Katharina Eggensperger, Frank Hutter
Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning recording
David Vos, Till Döhmen, Sebastian Schelter
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild recording
Weiyao Wang, Byung-Hak Kim, Varun Ganapathi

Poster

SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
Gowthami Somepalli, Avi Schwarzschild, Micah Goldblum, C. Bayan Bruss, Tom Goldstein
Generic Entity Resolution Models
Jiawei Tang, Yifei Zuo, Lei Cao, Samuel Madden
Towards Foundation Models for Relational Databases video pitch
Liane Vogel, Benjamin Hilprecht, Carsten Binnig
Diffusion models for missing value imputation in tabular data video pitch
Shuhan Zheng, Nontawat Charoenphakdee
STab: Self-supervised Learning for Tabular Data
Ehsan Hajiramezanali, Max W Shen, Gabriele Scalia, Nathaniel Lee Diamant
CASPR: Customer Activity Sequence based Prediction and Representation
Damian Konrad Kowalczyk, Pin-Jung Chen, Sahil Bhatnagar
Conditional Contrastive Networks
Emily Mu, John Guttag
Self-supervised Representation Learning Across Sequential and Tabular Features Using Transformers
Rajat Agarwal, Anand Muralidhar, Agniva Som, Hemant Kowshik
The Need for Tabular Representation Learning: An Industry Perspective
Joyce Cahoon, Alexandra Savelieva, Andreas C Mueller, Avrilia Floratou, Carlo Curino, Hiren Patel, Jordan Henkel, Markus Weimer, Roman Batoukov, Shaleen Deep, Venkatesh Emani, Richard Wydrowski, Nellie Gustafsson
STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables
Jaehyun Nam, Jihoon Tack, Kyungmin Lee, Hankook Lee, Jinwoo Shin<
Tabular Data Generation: Can We Fool XGBoost?
EL Hacen Zein, Tanguy Urvoy
SiMa: Federating Data Silos using GNNs video pitch
Christos Koutras, Rihan Hai, Kyriakos Psarakis, Marios Fragkoulis, Asterios Katsifodimos
Self Supervised Pre-training for Large Scale Tabular Data
Sharad Chitlangia, Anand Muralidhar, Rajat Agarwal
RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training
Zui Chen, Lei Cao, Samuel Madden
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang, David Palzer, Jialin Li, Eric Fosler-Lussier, Ningchuan Xiao
MET: Masked Encoding for Tabular Data
Kushal Alpesh Majmundar, Sachin Goyal, Praneeth Netrapalli, Prateek Jain
Active Learning with Table Language Models
Martin Ringsquandl, Aneta Koleva
Structural Embedding of Data Files with MAGRITTE video pitch
Gerardo Vitagliano, Mazhar Hameed, Felix Naumann

3rd Table Representation Learning Workshop @ NeurIPS 2024

14 December 2024, Vancouver, Canada.

SAP

Call for Papers

Important Dates

Scope

Organization

Workshop Chairs

UC Berkeley

Microsoft

Numbers Station AI

Sea AI Lab

University of Tübingen

Program

Invited Speakers

Google DeepMind

Qwen Team, Alibaba

University of Edinburgh

Inria, Probabl

UC Berkeley, Databricks

Panelists (tentative and TBC)

Numbers Station

Qwen Team, Alibaba

Contextual AI

Samaya AI

Submission Guidelines

Submission link

Formatting guidelines

Review process

Novelty and conflicts

Camera-ready instructions

Presentation instructions

Program Committee: TBC

Accepted Papers

2024

Oral

Poster

Oral

Poster